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1.  DSTTRODUCnON 


Multivariate  statistical  procedures  developed  under  normality  assumptions  are  well  advanced  (see,  for 
example,  Anderson  [1958]  and  Morrison  [1976]).  Some  of  these  procedures  claim  robustness  properties, 
especially  in  a  large  sample  situation,  that  may  serve  to  broaden  their  range  of  applicatioa  Nonparametric 
methods  for  multivariate  analysis  have  been  pursued,  notably  by  Puri  and  Sen  (1971),  but  their  more 
complete  development  awaits  further  research. 

This  report  considers  multivariate  hypothesis  testing  in  both  one-sample  and  two-sample  situations. 
Comparable  univariate  procedures  do  not  extend  readily  to  higher  dimensions.  The  methods  considered 
are  based  on  the  properties  of  statistically  equivalent  blocks,  which  have  received  attention  from  a  number 
of  researchers,  including  Fraser  (1957)  in  a  tolerance  interval  context  and  Anderson  (1966)  and  Wilks 
(1962)  in  an  inferential  setting. 

In  section  2  the  mechanics  of  the  procedure,  along  with  the  supporting  mathematics,  are  given.  In 
section  3  statistically  equivalent  blocks  are  applied  in  one-  and  two-sample  situations.  In  section  4 
proximity-based  cutting  functions  are  introduced  and  applied  in  the  two-sample  setting. 

2.  STATISTICALLY  EQUIVALENT  BLOCKS 

The  intent  of  the  construction  detailed  in  this  section  is  to  reduce  the  dimension  of  the  problem  in 
order  to  exploit  traditional  univariate  methods.  This  is  begun  by  partitioning  the  p-dimensional  real 
product  space  RP,  containing  the  observations  into  subspaces  or  blocks.  The  partition  is  effected  through 
the  use  of  functions  h:RP  ->  R  called  cutting  functions. 

2.1  Constmction  of  Blocks.  Let  Xj, ...,  Xjj  be  n  observations  of  a  p-component  random  vector  x  with 
distribution  function  F(x)  and  let  hj(x), ...,  h„(x)  be  n  (not  necessarily  distinct)  real  functions.  The 
functions  hj(x),  i  =  1, ...,  n  will  be  used  to  impose  an  order  on  the  vectors  Xj, ...,  Xjj.  The  value  of  the 
subscript  i  of  the  function  hj(x)  does  not  imply  an  order  of  application;  i.e.,  h2(x)  is  not  necessarily  applied 
first,  h2(x)  second,  etc.  To  emphasize  this,  a  permutation  of  the  integers  1, ...,  n  (denoted  k^, ...,  k„)  will 
indicate  the  order  of  application. 


1 


(k  ) 

Let  X  ^  Ise  the  vector  among  Xj, Xj,,  whose  image  under  the  mapping  h^(x)  is  the  k^th  order 
statistic;  i.e.,  x^^^^  is  the  observation  x  for  which  kj  -  1  of  hjj.^(x)  are  less  than  h^(x^^*^)  and  n  -  kj 
are  larger.  The  cutting  junction  h^  has  an  associated  level  set  in  RP  consisting  of 


{  X  I  \00  =  h^(x^^^J, 

which  defines  a  boundary  between  two  blocks: 

The  union  ^  U  +  i  n+j  =  Q  (the  sample  space)  and,  in  particular,  Bj  ^  will  contain  exactly 

kj  of  the  observations,  and  B^  +  !  .„+!  will  contain  the  remaining  n  -  kj  observations. 

This  process  is  continued,  applying  the  fimctions  hj^  (x), ...,  hu  (x)  in  sequence  to  further  subdivide 
RP  until,  after  n  iterations,  there  remain  n  +  1  blocks  Bj,...,  B^^j  with  Bj  Pi  Bj^  =  <j),  j  ;tk,  and 
y  Bj  =  Q .  The  function  hjj..(x)  that  is  applied  at  each  stage,  and  the  order  of  its  application,  is  not 

chosen  arbitrarily.  It  will  be  seen  that  the  order  of  application  is  dictated  by  power  considerations  of  an 

associated  hypothesis  test.  To  ensure  that  the  ordering  of  Xj, ...,  Xj,  by  h^Cx) . h„(x)  is  unique,  excepting 

a  set  of  measure  zero,  the  requirement  that  hj(x)  is  continuous  when  x  is  distributed  according  to  F(x)  is 
imposed. 

Before  proceeding  further,  an  illustrative  example  is  appropriate  (perhaps  imperative). 

Example  2.1.  Consider  the  sample  X  =  {x^, ...,  x„},  which  is  displayed  as  Table  1. 
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Table  1.  Sample  X  (p  =  2,  n  =  20) 


Ci(x)  =  Ci((x„  Xj))  =  Xj,  i  =  1,  2 


and  the  corresponding  permutation, 


(10,  5,  15,  3,  7,  12,  18,  2,  4,  6,  8,  11,  13,  16,  19,  1,  9,  14,  17,  20), 


(2.2) 


win  partition  the  sample  space  into  n  +  1  =  21  blocks. 


The  first  entry  in  the  permutation  (2.2)  is  kj  =  10.  The  cutting  function  from  (2.1),  hjQ  =  is 
appUed  to  x^, ...,  X20,  and  the  pre-image  of  the  10th  order  statistic  is  determined  to  be  x^g  =  (4.38, 7.81). 
This  defines  the  first  cut  which  divides  the  sample  space  into  two  parts  (blocks): 


Bi...  10  =  {x|ci(x)  <  4.38} 


and 


®ll...2i  =  {x|4.38  <  Ci(x)}  . 

The  second  entry  in  the  permutation  is  k2  =  5.  The  cutting  function  h5  =  C2  is  applied  next,  but  only 
to  those  sample  points  which  are  members  of  jq  since  the  5th  order  statistic  wiU  be  bounded  above 
by  the  10th  order  statistic.  The  appUcation  of  h5  subdivides  jq  into  Bj  5  and  Bg 
iteration,  k3  =  15  and  h^g  =  C2,  paritions  Bjj_2i  into  Bjj  35  and  Bjg_2i  under  the  same  argument;  the 
15th  order  statistic  is  bounded  below  by  the  10th  order  statistic.  These  blocks  are  depicted  in  Figure  2. 

This  process  is  continued  imtil  each  of  the  hj  has  been  applied.  The  sample  space  will  be  partitioned 
into  21  blocks,  as  depicted  in  Figure  3.  In  Figure  3,  some  representative  blocks  have  been  labelled.  This 
example  is  referenced  in  following  sections. 

2.2  Mathematical  Foundation.  Thus  far,  discussion  has  been  limited  to  the  mechanics  of  block 
construction,  without  any  motivation  for  engaging  in  such  an  exercise.  Toward  this  end,  consider 

Vjj  =  J  dF(x),  k  =  1,  n  +  1  . 

Bk 
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The  coverage  Vj^  is  the  probability  assigned  to  block  under  the  distribution  F(x).  It  can  be  shown  (see 
Anderson  [1966]  and  Wilks  [1962],  p.  238)  that  the  coverages  are  distributed  jointly  as  an  n-variate 
Dirichlet  distribution: 


(2.3) 


The  symmetry  of  the  coverages  Vj, in  equation  (2.3)  leads  to  reference  of  the  corresponding 
sample  blocks  Bj, ...,  B„:^j  as  statistically  equivalent  blocks. 
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As  a  direct  consequence  of  the  manner  in  which  the  blocks  are  constructed,  the  expression 


j  =  i. 


k=l 


n 


produces  variates  u^^\  u^"\  which  are  distributed  as  the  order  statistics  of  a  uniform  random  variable 

on  [0,  1]  (Anderson  [1966]).  The  joint  distribution  of  the  variates  is  an  ordered  n-variate  Dirichlet 
distribution  (Wilks  [1962],  p.  236). 

The  importance  of  these  results  is  twofold.  First,  the  results  do  not  depend  upon  the  specific  form 
of  the  distribution  function  F(x)  and,  as  such,  are  distribution  free.  Second,  testing  for  variates  uniformly 
distributed  may  be  accomplished  through  established  procedures  (see,  for  example,  D’Agostino  and 
Stephens  [1986]). 

3.  MULTIVARIATE  HYPOTHESIS  TESTING 

Statistically  equivalent  blocks  find  use  in  both  one-  and  two-sample  situations.  In  the  one-sample 
case,  a  multivariate  goodness-of-fit  test  may  be  accomplished.  In  the  two-sample  case,  a  test  for  identical 
distributions  follows  immediately  from  the  procedure  by  which  the  blocks  are  formed. 

3.1  One  Sample.  Section  2.2  provides  the  theoretical  foundation  for  a  multivariate  goodness-of-fit 
procedure.  Given  a  random  sample  x^, ....  Xj,  from  an  unknown  distribution  F(x),  and  a  completely 
specified  distribution  G(x),  the  hypotheses 

Ho:F(x)  =  G(x)  Vx 


and 


Hi;F(x)  ^  G(x) 


may  be  established. 

Example  3.1.  Suppose  that  the  data  presented  in  Table  1  are  to  be  tested  for  conformity  to  a  bivariate 
uniform  distribution  on  the  square  [0,  10]  x  [0,  10].  The  coverages  of  the  blocks  constructed  in 
accordance  with  cutting  functions  (2.1)  and  permutation  (2.2)  under  a  bivariate  uniform  assiunption  are 
presented  in  Table  2.  In  the  univariate  case,  the  scalars  Xj, ...,  x^  are  naturally  ordered  x^^^  <  ...  <  x^"^. 
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and  a  test  of  Ho:F(x)  =  G(x)  may  be  accomplished  by  determining  whether  =  G(x^*^^),  k  =  l,...,n, 
are  distributed  uniformly  on  [0,  1]. 


Table  2.  Block  Bj  Coverages 


i 

Vi 

i 

Vi 

1 

.0536 

12 

.0041 

2 

.0314 

13 

.0071 

3 

.0365 

14 

.0529 

4 

.0040 

15 

.0363 

5 

.0274 

16 

.0351 

6 

.0790 

17 

.0302 

7 

.1267 

18 

.0615 

8 

.0298 

19 

.0537 

9 

.0057 

20 

.1526 

10 

.0439 

21 

.0935 

11 

.0350 

The  argument  extends  to  the  multivariate  case,  but  the  construction  of  an  empirical  cumulative  distribution 
function  for  a  random  vector  x  does  not  hold  as  much  intuitive  appeal.  The  statistically  equivalent  blocks, 
once  constructed  from  Xj, ...,  Xj,  as  described  in  section  2.1,  can  be  remunbered  without  loss  of  generality 
and  accumulated  to  obtain  alternative  representations  of  a  cumulative  distribution  function. 

In  consideration  of  this,  a  test  based  on  probability  assignment  to  intervals  (blocks)  without  regard  to 
location  seems  more  appropriate.  Fisher  (1929)  provides  such  a  test  in  which  the  null  hypothesis  is 
rejected  if 


Pr{max  v.  >  V}  =  (N  +  1)(1  -  V)*^  -  -  2V)^  + 

j 


1 

FTT 


<  V  < 


k=l. 


N 


+  (-1)’^"^  N.lCkd  -  kV)N 


(3.1) 


exceeds  a  specified  level  of  significance  e. 
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The  test  can  be  carried  out  by  replacing  V  on  the  right  side  of  (3.1)  by  max  V;  and  evaluating  the 

j 

expression.  The  computed  value  is  the  observed  significance  level,  p.  From  Table  2,  max  Vj  =  V20  = 

j 

0.1526;  the  observed  significance  level  is  p  =  0.63,  far  too  large  to  reject  the  null  hypothesis  of  bivariate 
uniformity. 

3.2  Two  Sample.  The  decomposition  of  a  p-dimensional  sample  space  into  statistically  equivalent 
blocks  allows  for  a  ready  extension  to  a  two-sample  test.  Given  independent  random  samples 
X  =  {xj, ....  x„}  and  Y  =  {y^,  ....  y^}  from  unknown  distributions  F  and  G  respectively,  the  hypotheses 

Ho;F(x)  =  G(x)  Vx 
and 

Hi:F(x)  G(x) 


may  be  tested. 

The  mechanism  for  performing  this  test  is  peihaps  more  straightforward  than  for  the  one-sample  test. 
The  creation  of  the  statistically  equivalent  blocks  Bj,  i  =  1, ...,  n  +  1,  imposes  an  ordering  of  the 
observations  in  X  that  was  denoted  by  x^^\  ....  x^"\  Having  created  the  blocks  based  on  the  sample  X, 
a  relative  ordering  of  the  observations  in  X  and  Y  denoted  as  "«,"  according  to  the  rule  that 
yj  e  Bj  iff  «  yj  <<  x^j\  follows  immediately.  Under  the  null  hypothesis,  there  should  be  no 

significant  difference  in  the  rank  ordering  of  the  observations  from  X  (or  Y)  in  the  combined  sample. 
Therefore,  any  test  based  on  relative  ranking  of  the  observations  is  appropriate  for  use  in  testing  the 
hypothesis  of  identical  distributions. 

Example  32.  Consider  the  sample  Y  =  {y^, ...,  y^j)  shown  in  Table  3. 

Figure  4  displays  the  blocks  which  were  created  in  section  2. 1  based  on  the  sample  X,  with  the  points 
corresponding  to  sample  Y  overlaid.  Based  on  the  blocks  into  which  the  Y  observations  fall,  the 
combined  sample  may  be  ordered  as  follows: 


Table  3.  Sample  Y  (p  =  2,  m  =  20) 


i 

yi 

■1 

■  yi 

1 

(13.90,  2.13) 

11 

(6.33,  4.44) 

2 

(7.71,  6.89) 

12 

(11.86,  0.83) 

3 

(9.67,  6.20) 

13 

(7.42,  2.31) 

D 

(7.56,  0.90) 

14 

(9.15,  3.94) 

5 

(10.39,  2.43) 

15 

(12.73,  6.12) 

6 

(13.47,  0.45) 

16 

(6.58,  3.04) 

n 

(14.55,  0.01) 

17 

(7.34,  3.69) 

g 

(7.46,  0.18) 

18 

(8.12,  2.59) 

9 

(11.25,  1.08) 

19 

(7.79,  3.68) 

10 

(13.0,  82.37) 

20 

(5.65,  2.40) 

Rgure  4.  Blocks  constructed  from  X  with  Y  overlaid. 

Any  rank-based  hypothesis  test  may  be  applied  to  this  relative  ordering.  The  Smirnov  two-sample  test 
provides  a  statistic  of  0.55  and  a  corresponding  p-value  of  0.004.  The  Mann- Whitney  test,  which  is 
primarily  a  test  for  difference  in  location,  provides  a  p-value  of  less  than  0.002. 
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4.  PROXIMITY-BASED  CUTTING  FUNCTIONS 


Cutting  functions  appearing  in  the  literature  are  most  often  component-wise:  h(x)  =  Xj,  a  choice  which 
facilitates  presentation  in  two  dimensions.  Another  class  of  cutting  functions,  based  on  proximity  to  the 
observations  X  =  {xj}  (where  "proximity"  is  in  an  Euclidean  metric  sense),  is  now  considered. 

Let  z  e  RP,  the  p-dimensional  sample  space  containing  the  observations  X  =  {xj}  and  Y  =  (yj). 
Consider  a  function  D:RP  R"  defined  by  D^^Cz)  =  {dj},  where  the  dj  arc  the  Euclidean  distances  from 
z  to  each  of  the  observations  Xj  e  X.  Without  loss  of  generality,  assume  that  dj  <  d2  ^  ^  djj.  Now, 

for  any  real  function  H:R"  R  form  the  composite  h(z)  =  H(D(z)).  The  function  h  will  be  called  a 
proximity-based  cutting  function  (PBCF)  because  the  value  taken  on  reflects  the  proximity  of  z  to  the 
members  of  X. 

Consider  the  expression 


n 

H(D(z))  =  £  a, >  0.  (4.1) 

«=l 

It  is  clear  that  the  order  statistics  d^^\  ...,  d^'"^  along  with  aU  linear  combinations,  are  special  cases  of 
equation  (4.1).  hi  section  2.1,  the  requirement  that  h|(x)  be  continuous  when  x  is  distributed  continuously 
was  imposed.  Since  the  distance  functions  dj  are  clearly  continuous,  the  expression  (4.1)  is  also 
continuous,  and  the  legitimacy  of  h(z)  =  H(D(z))  as  a  cutting  function  is  established. 

The  motivation  for  examining  this  class  of  functions  is  as  follows.  In  the  two-sample  case,  the 
question  "How  closely  does  a  random  sample  X  resemble  a  random  sample  Y?"  is  posed.  Univariate  rank 
tests  address  this  problem  following  an  argument  that,  under  a  null  hypothesis  of  no  difference,  the  sample 
X  will  be  interspersed  among  the  sample  Y.  The  choice  of  PBCF  is  an  attempt  to  extend  this  argument 
to  higher  dimensions.  Appropriately  chosen  PBCFs  should  partition  the  multidimensional  space  into 
statistically  equivalent  blocks  that  will  distinguish  when  the  observations  under  consideration  are  indeed 
in  and  among  their  counterpart. 
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Example  4.1.  The  choice  of  cutting  functions  in  Example  2. 1  gave  level  sets  which  were  straight  lines 
(or  hyperplanes  in  higher  dimension).  The  nature  of  the  level  sets  and  statistically  equivalent  blocks  is 
not  as  intuitive  for  the  PBCFs.  Consider  H(D(z))  =  d2  +  d3.  This  function  maps  a  point  z  e  RP  to  the 
sum  of  the  distances  to  the  second  and  third  closest  x  e  X.  For  the  data  from  Example  2.1,  this  cutting 
function  produces  the  level  sets  shown  in  Figure  5. 


Figure  5.  Blocks  corresponding  to  hfzl  =  d2  +  dj. 


The  statistically  equivalent  blocks  do  not  resemble  "blocks"  at  all  for  this  choice  of  cutting  function. 
Rather,  the  blocks  are  the  areas  bounded  by  level  sets.  This  cutting  function  may  be  used  to  repeat  the 
hypothesis  test  detailed  in  Example  3.2.  In  Figure  6,  the  sample  Y  has  been  overlaid  on  the  blocks  from 
Figure  5.  Some  of  the  level  sets  have  been  removed  to  allow  the  observations  ye  Y  to  be  distinguished. 
A  relative  ordering  of  the  two  samples  is  again  created.  The  Smirnov  test  yields  a  statistic  of  0.45  for  this 
ordering,  which  corresponds  to  a  p-value  of  0.034. 

Since,  in  Example  3.2,  the  Smirnov  test  returned  a  p-value  of  0.004,  it  would  be  unlikely  to  observe 
an  even  higher  level  of  significance  for  these  data  and  this  hypothesis  regardless  of  the  choice  of  cutting 
function.  In  most  practical  simations,  either  value  (0.004  or  0.034)  is  sufficient  to  abandon  the  null 
hypothesis.  The  intent  is  that  PBCFs  lead  to  a  more  powerful  test  of  hypothesis,  and  while  the  notion  that 
the  sample  X  be  interspersed  among  the  sample  Y  under  Hq  is  not  incorrect,  it  is  incomplete.  The 
requirement  that  Y  be  interspersed  among  the  sample  X  is  equally  important 
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Figure  6.  Sample  X  blocks  with  Y  overlaid. 

Consider  the  situation  depicted  in  Figure  7.  The  level  curves  from  Figure  5  have  superimposed  a 
subset  of  observations  from  Y  that  retain  their  integrity  in  the  combined  data  set  Again,  some  level 
curves  have  been  removed  in  order  that  the  values  from  Y  may  be  seen  more  clearly.  The  Smirnov 
statistic  in  this  instance  is  0.35,  corresponding  to  a  p-value  of  0.264 — the  test  has  lost  power  against  this 
type  alternative. 

The  problem  is  that  the  mixture  of  x’s  and  y’s  in  the  combined  sample  is  not  homogeneous.  A  direct 
approach  to  dealing  with  this  situation  is  to  reverse  the  roles  of  X  and  Y;  i.e.,  construct  blocks  according 
to  the  sample  Y  and  consider  the  dispersion  of  the  sample  X.  The  two  tests  of  hypothesis  can  then  be 
combined  with  a  level  of  significance  determined  as  follows.  If  the  individual  tests  have  significance 
levels  ttj  and  ct^,  respectively,  then  the  combined  test  has  significance  level  a<a-^  +  ct^.  To  establish 
a  level  of  significance  a,  it  will  suffice  to  set  aj  =  02  =  a/2.  If  the  individual  tests  have  observed 
significance  levels  (a.k.a.  critical  levels)  of  p^  and  P2,  then  the  observed  significance  level  for  the 
combined  test  is  p  =  2  min  (pj,  P2). 

Figure  8  illustrates  blocks  constructed  fiom  Y  with  X  overlaid.  The  Smirnov  statistic  is  0.683, 
corresponding  to  a  p-value  of  0.0008.  The  critical  level  of  the  combined  test  procedure  is  then 

p  <  0.0016. 
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Figure  8.  Blocks  constructed  firom  Y  with  X  overlaid. 
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5.  SUMMARY  AND  CONCLUSIONS 


The  sample  X,  introduced  in  Example  2.1,  was  taken  from  a  uniform  distribution  on  the  square 
[0,10]  X  [0,10].  Not  surprisingly,  the  test  of  hypothesis  of  bivariate  uniformity  detailed  in  Example  3.1 
produced  an  observed  significance  level  p  =  0.63,  suggesting  good  agreement  between  data  and  hypothesis. 

The  sample  Y,  introduced  in  Example  3.2,  was  taken  from  a  uniform  distribution  on  the  square 
[5,15]  X  [0,10].  Again,  both  tests  of  hypotheses  presented  in  Examples  3.2  and  4.1  detected  the  change 
in  location  even  though  the  marginal  distributions  of  the  parent  populations  for  X  and  Y  coincide  on  the 
ordinate.  The  Mann-Whimey  test  speared  more  sensitive  to  the  shift  in  location. 

The  situation  depicted  in  Figure  7  is  one  in  which  the  level  curves  from  Figure  5  have  superimposed 
those  observations  from  Y  contained  in  [5,10]  x  [0,10].  In  an  attempt  to  overcome  an  attendant  loss  of 
power,  the  roles  of  X  and  Y  were  interchanged  and  a  combined  test  was  performed.  The  combined  test 
had  an  observed  significance  level  of  0.0016. 

The  concept  of  PBGF  holds  promise  for  the  analysis  of  multivariate  data.  Additional  research  is 
clearly  in  order.  The  power  of  the  procedure  has  not  been  investigated;  scaling  of  the  variates  in  relation 
to  the  PBCF  was  not  addressed;  and  only  a  single  PBCF  was  illustrated.  Computationally  intensive 
methods  for  statistical  data  analysis  is  a  natural  extension  of  the  powerful  and  economical  computing 
resources  that  are  readily  available  to  the  researcher  and  will  continue  to  receive  emj^asis  as  a  research 
area  in  mathematical  statistics. 
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